453 research outputs found

    Utopia documents: linking scholarly literature with research data

    Get PDF
    Motivation: In recent years, the gulf between the mass of accumulating-research data and the massive literature describing and analyzing those data has widened. The need for intelligent tools to bridge this gap, to rescue the knowledge being systematically isolated in literature and data silos, is now widely acknowledged

    COVID-19, A Global Health Concern Requiring Science-Based Solutions

    Get PDF
    Scientifically-based concrete action points to reduce the spread, lessen the impact, reduce the concerns of the wider population, and avoid further outbreaks for governments, organizations, and individuals are neededFinal Published versio

    SSMap: A new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequences and structures provide valuable complementary information on protein features and functions. However, it is not always straightforward for users to gather information concurrently from the sequence and structure levels. The UniProt knowledgebase (UniProtKB) strives to help users on this undertaking by providing complete cross-references to Protein Data Bank (PDB) as well as coherent feature annotation using available structural information. In this study, SSMap – a new UniProt-PDB residue-residue level mapping – was generated. The primary objective of this mapping is not only to facilitate the two tasks mentioned above, but also to palliate a number of shortcomings of existent mappings. SSMap is the first isoform sequence-specific mapping resource and is up-to-date for UniProtKB annotation tasks. The method employed by SSMap differs from the other mapping resources in that it stresses on the correct reconstruction of the PDB sequence from structures, and on the correct attribution of a UniProtKB entry to each PDB chain by using a series of post-processing steps.</p> <p>Results</p> <p>SSMap was compared to other existing mapping resources in terms of the correctness of the attribution of PDB chains to UniProtKB entries, and of the quality of the pairwise alignments supporting the residue-residue mapping. It was found that SSMap shared about 80% of the mappings with other mapping sources. New and alternative mappings proposed by SSMap were mostly good as assessed by manual verification of data subsets. As for local pairwise alignments, it was shown that major discrepancies (both in terms of alignment lengths and boundaries), when present, were often due to differences in methodologies used for the mappings.</p> <p>Conclusion</p> <p>SSMap provides an independent, good quality UniProt-PDB mapping. The systematic comparison conducted in this study allows the further identification of general problems in UniProt-PDB mappings so that both the coverage and the quality of the mappings can be systematically improved for the benefit of the scientific community. SSMap mapping is currently used to provide PDB cross-references in UniProtKB.</p

    Dr. Zompo: an online data repository for Zostera marina and Posidonia oceanica ESTs

    Get PDF
    As ecosystem engineers, seagrasses are angiosperms of paramount ecological importance in shallow shoreline habitats around the globe. Furthermore, the ancestors of independent seagrass lineages have secondarily returned into the sea in separate, independent evolutionary events. Thus, understanding the molecular adaptation of this clade not only makes significant contributions to the field of ecology, but also to principles of parallel evolution as well. With the use of Dr. Zompo, the first interactive seagrass sequence database presented here, new insights into the molecular adaptation of marine environments can be inferred. The database is based on a total of 14 597 ESTs obtained from two seagrass species, Zostera marina and Posidonia oceanica, which have been processed, assembled and comprehensively annotated. Dr. Zompo provides experimentalists with a broad foundation to build experiments and consider challenges associated with the investigation of this class of non-domesticated monocotyledon systems. Our database, based on the Ruby on Rails framework, is rich in features including the retrieval of experimentally determined heat-responsive transcripts, mining for molecular markers (SSRs and SNPs), and weighted key word searches that allow access to annotation gathered on several levels including Pfam domains, GeneOntology and KEGG pathways. Well established plant genome sites such as The Arabidopsis Information Resource (TAIR) and the Rice Genome Annotation Project are interfaced by Dr. Zompo. With this project, we have initialized a valuable resource for plant biologists in general and the seagrass community in particular. The database is expected to grow together with more data to come in the near future, particularly with the recent initiation of the Zostera genome sequencing project

    Accurate Prediction of DnaK-Peptide Binding via Homology Modelling and Experimental Data

    Get PDF
    Molecular chaperones are essential elements of the protein quality control machinery that governs translocation and folding of nascent polypeptides, refolding and degradation of misfolded proteins, and activation of a wide range of client proteins. The prokaryotic heat-shock protein DnaK is the E. coli representative of the ubiquitous Hsp70 family, which specializes in the binding of exposed hydrophobic regions in unfolded polypeptides. Accurate prediction of DnaK binding sites in E. coli proteins is an essential prerequisite to understand the precise function of this chaperone and the properties of its substrate proteins. In order to map DnaK binding sites in protein sequences, we have developed an algorithm that combines sequence information from peptide binding experiments and structural parameters from homology modelling. We show that this combination significantly outperforms either single approach. The final predictor had a Matthews correlation coefficient (MCC) of 0.819 when assessed over the 144 tested peptide sequences to detect true positives and true negatives. To test the robustness of the learning set, we have conducted a simulated cross-validation, where we omit sequences from the learning sets and calculate the rate of repredicting them. This resulted in a surprisingly good MCC of 0.703. The algorithm was also able to perform equally well on a blind test set of binders and non-binders, of which there was no prior knowledge in the learning sets. The algorithm is freely available at http://limbo.vib.be

    Enabling comparative modeling of closely related genomes: Example genus Brucella

    Get PDF
    For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this short report, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as well as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.We thank Jean Jacques Letesson, Maite Iriarte, Stephan Kohler and David O'Callaghan for their input on improving specific annotations. This project has been funded by the United States National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN272200900040C, awarded to BW Sobral, and from the United States National Science Foundation under Grant MCB-1153357, awarded to CS Henry. J.P.F. acknowledges funding from [FRH/BD/70824/2010] of the FCT (Portuguese Foundation for Science and Technology) Ph.D. scholarship
    corecore